RDF Data Descriptions

نویسندگان

  • Georg Lausen
  • Michael Schmidt
چکیده

Linked Open Data (LOD) sources on the Web are increasingly becoming more popular. RDF constraints can be used to characterize the RDF graphs being provided by such sources. For applications that process data retrieved from several of such RDF graphs it becomes interesting to analyze the relationships of the different sets of constraints associated with the sources providing the RDF graphs. In this short paper we discuss how the constraints from different sources can be aggregated to a set of constraints characterizing the union of the RDF graphs under consideration. For expressing constraints we use Datalog+/-. The recent RDF Validation Workshop [9] states a gap between the current standards offering and the industry needs for validation of RDF data. As a possible solution, in continuation of our previous work [7], we have developed a constraint language RDD (RDF Data Descriptions) [8], that captures a broad range of constraints including keys, cardinalities, subclass, and subproperty restrictions, making it easy to implement RDD checkers and clearing the way for semantic query optimization. The intention of an RDD is similar to Stardog ICV [1], where constraints are stated using OWL and considered relative to a certain inference machinery whose type may range from no inferencing, RDFSto OWL-inferencing. In contrast, RDD is a language using a compact specialpurpose syntax designed for only expressing constraints independent of a specific inference machinery. This makes RDD in particular applicable for RDF under ground semantics, which is a common scenario in the Linked Data context. While in [8] we considered a restricted scenario where a single RDD defines the constraints given in a single RDF graph, in this short paper we suggest to broaden the view to a set of RDF graphs, each described by its own RDD defining a set of associated constraints. The major difficulties of such a scenario arise as the information represented by the graphs may overlap in the sense that certain resources may be described in more than one graph. To accomplish such situations the notion of a context has been coined [6]. While in this paper RDF in different context is discussed with respect to information aggregation, our concern is aggregation of constraints, which has not been studied before, to the best of our knowledge. Let us consider RDF graphs Ga and Gb and corresponding RDDs RDDa and RDDb, respectively. Both graphs are assumed to be consistent, i.e. all the constraints in the respective RDD are fulfilled. The main question we are interested in is how the union of the graphs Ga ∪ Gb is related to the union of the respective sets of constraints Σa and Σb. As Ga and Gb may contain triples referring to subjects with the same URI, in general it will hold Ga ∪Gb does not fulfill all constraints in Σa ∪Σb. For example, whenever a certain predicate p is defined to be single-valued in RDDa and RDDb, then two corresponding triples (s, p, o1) and (s, p, o2) may appear in the union Ga ∪Gb of both graphs, so that the constraint is not guaranteed to hold in Ga ∪Gb. As solution for such cases we propose aggregation of constraints what in our example would mean that the single-valued constraint is replaced by a constraint restricting the number of occurrences of values to 2. In general, given RDDs Σa and Σb, we are interested to construct an RDD Γ (Σa, Σb) such that for any RDF graphs Ga and Gb, where Ga |= Σa and Gb |= Σb, we have Ga |= Γ (Σa, Σb), Gb |= Γ (Σa, Σb) and Ga∪Gb |= Γ (Σa, Σb). The task of deriving Γ (Σa, Σb) is called constraint aggregation. Recently, Cortés-Calabuig and Paredaens [4] have presented a constraint language for RDF equipped with deductive rules for equality and tuple generating dependencies. However, as can be seen from the following example, their constraint language is not general enough to be used for an RDD. For these reasons we have chosen the framework of Datalog+/[2], which offers the needed expressiveness. In Figure 1 we exhibit two RDDs describing RDF graphs representing employees and projects. To demonstrate constraint aggregation let us consider predicate reportsTo, which is defined via a path-constraint in RDDa. In RDDb for reportsTo it is defined that each employee must report to exactly two objects. These constraints are written in Datalog+/as follows (predicate names are abbreviated): G($s, rT, $o) → ∃$o1(G($s, wF, $o1), G($o1, aT, $o)) G($s, rT, $o1), G($s, rT, $o2), G($s, rT, $o3) → $o1 = $o2 ∨ $o1 = $o3 ∨ $o2 = $o3 G($s, type, E) → ∃$o1, o2(G($s, rT, $o1), G($s, rT, $o2), $o1 6= $o2) Moreover, RDDa defines worksFor and assignedTo as total functions. Therefore it can be inferred that reportsTo is a partial function, even though this is not stated in RDDa. Using this additional information, PREFIX ex: CWA CLASS ex:Employee { KEY rdfs:label : LITERAL PARTIAL ex:employedBy : RESOURCE MAX(2) ex:prevEmployedBy : RESOURCE TOTAL ex:worksFor, RANGE (ex:Project) PATH(ex:worksFor/ex:assignedTo) ex:reportsTo, RANGE(ex:Consortium) } CWA CLASS ex:Project { TOTAL ex:assignedTo, RANGE(ex:Consortium) } PREFIX ex: CWA CLASS ex:Employee { KEY rdfs:label : LITERAL PARTIAL ex:employedBy : RESOURCE ex:prevEmployedBy : RESOURCE, SUBPROPERTY employedBy MIN(2), MAX(2) ex:reportsTo, RANGE(ex:Association) } Fig. 1. RDDa (left) and RDDb (right) describing employees in different contexts. See [8] for a detailed explanation of the used concepts. by constraint aggregation we get a min(0)and max(3)-constraint for predicate reportsTo. Note that without the inferred constraint, predicate reportsTo has to be considered to be unrestricted and therefore only the trivial constraint max(∞) can be derived by constraint aggregation. Formally, inferring constraints in the Datalog+/framework can be done based on the chase procedure [5]. We are currently investigating termination and complexity of the corresponding constraint implication problem. However, constraint aggregation, as proposed in this paper, by itself is independent from the concrete constraint language considered. For example, in [3] for a wide number of ontology languages termination and efficiency of the required chase procedure is demonstrated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases

ABSTRACT Metadata are widely used in order to fully exploit information resources available on corporate intranets or the Internet. The Resource Description Framework (RDF) aims at facilitating the creation and exchange of metadata as any other Web data. The growing number of available information resources and the proliferation of description services in various user communities, lead nowadays...

متن کامل

Necessities on a Descriptive Level for Reusing Metadata Descriptions

The present RDF is opaque in reusing the descriptions in ready-made metadata repositories. To enhance reusability of descriptions in RDF/XML, we need a unit for packaging a set of descriptions on a semantic schema level. And, we may need a new rule to handle a composite unit of not linearfold but link-based or graph data structure, and serialize it to reuse.

متن کامل

Domain-Adaptable Hybrid Generation of RDF Entity Descriptions

RDF ontologies provide structured data on entities in many domains and continue to grow in size and diversity. While they can be useful as a starting point for generating descriptions of entities, they often miss important information about an entity that cannot be captured as simple relations. In addition, generic approaches to generation from RDF cannot capture the unique style and content of...

متن کامل

Functional Queries to Wrapped Educational Semantic Web Meta-Data

The aim of the Edutella project is to provide a peer-to-peer infrastructure for educational material retrieval using semantic web meta-data descriptions of educational resources. Edutella uses the semantic web meta-data description languages RDF and RDF-Schema for describing web resources. The aim of this work is to wrap the Edutella infrastructure with a functional mediator system. This makes ...

متن کامل

Toward RDF Normalization

Billions of RDF triples are currently available on the Web through the Linked Open Data cloud (e.g., DBpedia, LinkedGeoData and New York Times). Governments, universities as well as companies (e.g., BBC, CNN) are also producing huge collections of RDF triples and exchanging them through different serialization formats (e.g., RDF/XML, Turtle, N-Triple, etc.). However, RDF descriptions (i.e., gra...

متن کامل

XML-Based RDF Query Language (XRQL) and its Implementation

Resource Description Framework (RDF) is a language which represents information about resources. In order to search RDF resource descriptions, several RDF query languages such as RQL and SquishQL have been proposed. However, these RDF query languages do not use XML syntax and they have limited functionality. xRQL is proposed to solve these issues by defining an XML-based RDF query language with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014